4.3 Q8: Exploring the Time Analysis by ARIMA

Building on the univariate analysis provided by LSTM in the previous section, this segment introduces VAR and ARIMA models to explore multivariate effects. These methods allow us to consider the impact of Reddit post content on Dogecoin price by incorporating multiple time series variables, thus enabling a more comprehensive analysis of how social media influences cryptocurrency markets.

4.3.1 - ARIMA(p, d, q) Model Equation

The ARIMA model combines autoregressive (AR) elements, differencing for stationarity (I), and moving average (MA) components. It is denoted as ARIMA(p, d, q), where:

$p$: Number of autoregressive terms
$d$: Number of nonseasonal differences needed for stationarity
$q$: Number of lagged forecast errors in the prediction equation

Differencing (I)

To achieve stationarity, the series is differenced $d $times. The differenced series $\nabla^d y_t$ is calculated as:

\[ \nabla y_t = y_t - y_{t-1} \]

Autoregressive (AR) Part

The AR part involves using $p$ past values:

\[ \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_p y_{t-p} \]

Moving Average (MA) Part

The MA part incorporates the errors from $q$ past forecasts:

\[ \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \dots + \theta_q \epsilon_{t-q} \]

where $\epsilon_{t-1},\epsilon_{t-2}, \dots$ are the error terms from previous forecasts, and $\theta_1, \theta_2, \dots, \theta_q$ are coefficients.

Combined ARIMA Equation

Combining these components, we run two ARIMA models with price and buy signals as dependent variables:

Model for price ($p=1$, $d=1$, $q=1$ )

\[ price_t' = c + \phi_1 price_{t-1}' + \theta_1 \epsilon_{t-1} + \epsilon_t \]

Model for buy signals ($p=1$, $d=0$, $q=1$ )

buy_signals is already stationary, so we don’t need to difference it again.

\[ buy\_signals_t = c + \phi_1 buy\_signals_{t-1} + \theta_1 \epsilon_{t-1} + \epsilon_t \]

where: - $y_t'$ is differenced series (if $d >0$)

Core Results of ARIMA Models

Table Summary

Result	Dep. Variable	Observations	AIC	BIC	Log Likelihood	Const/Coef	AR.L1	MA.L1	Sigma^2
Res 1	`n_buy_sig`	6586	45463.716	45490.887	-22727.858	11.5079	0.8265	-0.1053	58.2006
Res 2	`price`	6586	-67490.359	-67469.981	33748.179	NA	-0.4687	0.4380	2.07e-06

Interpretation of Results

Res 1 (n_buy_sig): The ARIMA(1, 0, 1) model for n_buy_sig demonstrates good predictability with a significant AR1 coefficient of 0.8265, suggesting a strong autoregressive term. The negative MA1 coefficient (-0.1053) indicates a slight adjustment in the opposite direction of the error term from the previous period. The model has a relatively high AIC and BIC, pointing to the complexity of the model but a necessary fit for the data characteristics.
Res 2 (price): For the price variable modeled with ARIMA(1, 1, 1), the coefficients for both AR1 and MA1 are significant but with opposite signs, suggesting partial offsetting effects. The model achieves an extremely low AIC and BIC, indicating an excellent fit. The Log Likelihood is exceptionally high, which, combined with a very low sigma^2, points to a highly effective model for forecasting price.

4.3.2 Vector Autoregression (VAR)

Vector Autoregression (VAR) is a statistical model used to capture the linear interdependencies among multiple time series. VAR models generalize the ARIMA model by allowing more than one evolving variable. Each variable in a VAR model is a linear function of past lags of itself and past lags of the other variables. This makes VAR suitable for systems where the variables influence each other.

A VAR model describes each variable with an equation that combines:

The variable’s own lags (autoregressive part).
The lags of other variables in the system.

To illustrate the standard form of a VAR model for variables $y_t $ and $x_t $, the equations for this system can be expressed as:

\[ \begin{align*} y_t &= c_1 + \phi_{11} y_{t-1} + \phi_{12} x_{t-1} + \epsilon_{1t} \\ x_t &= c_2 + \phi_{21} y_{t-1} + \phi_{22} x_{t-1} + \epsilon_{2t} \end{align*} \]

Wh $c_1$ and $c_2$ are constants (intercepts of the equations).

$\phi_{11}$, $\phi_{12}$,$\phi_{21}$, and $\phi_{22}$ are the coefficients of the lagged values of $y$ and $\epsilon_{1t}$ and $\epsilon_{2t}$ $are the error terms, assumed to be white noise.

Condensed Summary of VAR Model Results

Equation	Coefficient	Std. Error	t-stat	Prob
price_s
L1.price_s	-0.027636	0.012353	-2.237	0.025
L2.price_s	0.031450	0.012355	2.546	0.011
L7.price_s	0.041803	0.012343	3.387	0.001
L9.price_s	0.033151	0.012362	2.682	0.007
L11.price_s	-0.025686	0.012359	-2.078	0.038
n_buy_sig
L1.n_buy_sig	0.707021	0.012347	57.265	0.000
L2.price_s	-136.544667	64.626404	-2.113	0.035
L4.n_buy_sig	0.068851	0.015113	4.556	0.000
L7.price_s	178.560081	64.568315	2.765	0.006
L10.n_buy_sig	0.034079	0.015100	2.257	0.024
L11.n_buy_sig	0.036609	0.012330	2.969	0.003

Equation for `price_s`

\[ price_{s,t} = -0.027636 \cdot price_{s,t-1} + 0.031450 \cdot price_{s,t-2} + 0.041803 \cdot price_{s,t-7} + 0.033151 \cdot price_{s,t-9} - 0.025686 \cdot price_{s,t-11} + \epsilon_{t} \]

Equation for `n_buy_sig`

\[ n\_buy\_sig_t = 0.707021 \cdot n\_buy\_sig_{t-1} - 136.544667 \cdot price_{s,t-2} + 0.068851 \cdot n\_buy\_sig_{t-4} + 178.560081 \cdot price_{s,t-7} + 0.034079 \cdot n\_buy\_sig_{t-10} + 0.036609 \cdot n\_buy\_sig_{t-11} + \epsilon_{t} \]

Impulse response functions (IRF)

In Vector Autoregression (VAR) models, an IRF maps the reaction of endogenous variables in the model to a one-unit increase in the shock variable, holding all else constant. In the following IRF chart, the response of price_s (differenced price for stationarity) to a shock in n_buy_sig is displayed over several hours.

The components of an IRF plot are explained below:

X Axis: Displays the time intervals following a shock (hours)
Y Axis: Measures the magnitude of the response from the dependent variable ($for price, number of buy signals)
Blue Line: Represents the estimated response of the variable to the shock across different time periods.
Dashed Lines: These indicate the confidence intervals, showing the range where the true response likely falls, typically with a 95% confidence level.

A response line that crosses the zero line signifies changes in the direction of the response over time.

(If the confidence interval includes the zero line, it indicates that the response is not statistically significant at those points.)

In the first IRF graph, the response of “price_s” to shocks in “n_buy_sig” is observed. Initially, the response drops below zero, indicating a negative impact, before oscillating around zero. This pattern suggests an immediate negative reaction followed by ongoing uncertainty in both the direction and magnitude of the impact.

The second IRF graph examines the response of “n_buy_sig” to shocks in “price_s.” The response begins at zero, then dips into the negative territory and exhibits a pattern of oscillation that gradually returns toward zero. The diminishing amplitude of the response over time suggests that the impact of the shock lessens as time progresses.